To make a movie from the data contained in the gdpm package, we also need the poseid package for the visualization and the package animation to make a movie.
> devtools::install_github("choisy/gdpm")
> devtools::install_github("choisy/poseid")
> install.packages("animation")
Loading and attaching the packages:
> library(gdpm)
> library(poseid)
> library(animation)
We also need other packages for the preparation of the data, for the geographical data and for the population size information.
> # Important packages
> library(dplyr) # for 'select', 'filter', 'mutate'....
> library(gso) # for pop_size
The first important step is to look at the data and to decide a way to choose the more interesting intervals to show the variation of the number of cases across the year.
For this example, we will take the dengue disease from 1995 to 2015.
First, let’s load the dengue dataset which is contained in the gdpm package and can be loaded with the getid function. For more details please see the documentation and the vignettes in the gdpm package.
> # Load the dataset from the gdpm package
> dengue <- getid(dengue, from = 1995)
> dim(dengue)
[1] 13104 5
> range(dengue$year)
[1] 1995 2015
> head(dengue)
province year month incidence_dengue mortality_dengue
1 An Giang 1995 January 34 1
2 An Giang 1995 February 58 0
3 An Giang 1995 March 39 0
4 An Giang 1995 April 54 0
5 An Giang 1995 May 76 0
6 An Giang 1995 June 175 0
It is important to normalize the data before visualization, for that we will use the size of the population and calculate the incidence rate per province. We can obtain the population size information from the gso package. The data need to be expressed by the same province as the dengue data frame. For that, we can use the merge_prov function in the poseid package.
For more informations:
> ?merge_prov
> ?pop_size
Load the pop_size data frame from the gso package:
> pop <- gso::pop_size
> head(pop)
province year female male rural total urban
1 An Giang 1995 1003.4 966.7 1602.1 1970.1 368.0
2 An Giang 1996 1012.6 978.1 1619.8 1990.7 370.9
3 An Giang 1997 1022.0 989.1 1616.0 2011.1 395.1
4 An Giang 1998 1031.4 1001.1 1634.5 2032.5 398.0
5 An Giang 1999 1043.9 1011.5 1650.1 2055.4 405.3
6 An Giang 2000 1047.6 1014.1 1604.2 2061.7 457.5
The data in the pop data frame need to be merged back together to be expressed by the province in 1995, to have the same data geographical definition as the dengue data frame.
> pop <- merge_prov(pop, from = "1995-01-01")
> head(pop)
# A tibble: 6 x 4
province year key value
<chr> <int> <chr> <dbl>
1 An Giang 1995 female 1003.4
2 An Giang 1995 male 966.7
3 An Giang 1995 rural 1602.1
4 An Giang 1995 total 1970.1
5 An Giang 1995 urban 368.0
6 An Giang 1996 female 1012.6
> identical(unique(pop$province), unique(dengue$province))
[1] TRUE
Only the population total is selected and the column value is renamed population:
> pop <- filter(pop, key == "total")
> pop <- select(pop, province, year, population = value)
> head(pop)
# A tibble: 6 x 3
province year population
<chr> <int> <dbl>
1 An Giang 1995 1970.1
2 An Giang 1996 1990.7
3 An Giang 1997 2011.1
4 An Giang 1998 2032.5
5 An Giang 1999 2055.4
6 An Giang 2000 2061.7
The pop_size contains the population sizes expressed in thousand people, so we multiply the data by 1000 to have the data expressed in people.
> pop <- mutate(pop, population = population * 1000)
> head(pop)
# A tibble: 6 x 3
province year population
<chr> <int> <dbl>
1 An Giang 1995 1970100
2 An Giang 1996 1990700
3 An Giang 1997 2011100
4 An Giang 1998 2032500
5 An Giang 1999 2055400
6 An Giang 2000 2061700
After joining the pop and the dengue data frames by province and by year, we can now calculate the incidence rate per month, year and province : \[Incidence\ rate = \frac{number\ of\ case}{population\ total\ \times\ 10000\ persons}\]
> # join the population dataframe to the dengue dataset and calculate the incidence rate
> dengue <- left_join(dengue, pop, by = c("province", "year"))
> dengue <- mutate(dengue, incidence_rate = (incidence_dengue / population)*10000)
> head(dengue)
province year month incidence_dengue mortality_dengue population
1 An Giang 1995 January 34 1 1970100
2 An Giang 1995 February 58 0 1970100
3 An Giang 1995 March 39 0 1970100
4 An Giang 1995 April 54 0 1970100
5 An Giang 1995 May 76 0 1970100
6 An Giang 1995 June 175 0 1970100
incidence_rate
1 0.1725801
2 0.2944013
3 0.1979595
4 0.2740978
5 0.3857672
6 0.8882798
It can be interesting to test different methods to select the best intervals. For that, we can use the function breaks in the poseid package. Many different methods of selection of intervals are implemented in the breaks function and you can try different method just by changing the style parameters.
For more information, see the Vizualizing the gdpm package vignettes in the gdpm package and/or:
> ?breaks # information on the breaks function
> ?classInt::classIntervals # information on the style used in the breaks function
As the data contain 13104 rows, some methods can take a long time to calculate the intervals. So 50% of the data are randomly selected and the methods of selection for the intervals can be used on this sample to save time and calculation.
> sample_dengue <- sample_frac(dengue, 0.5)
Here, we show the example of the quantile and the fisher methods. The function breaks has a parameter distrib, which permits to print the distribution of the value by breaks. It allows you to see the difference in distribution between different style.
First, the quantile method which provides quantile breaks on the incidence rate:
> quant <- breaks(sample_dengue, "incidence_rate", n = 6, style = "quantile", pal = rev(heat.colors(6)), distribution = TRUE)
We can print the intervals:
> attr(quant, "breaks")
[1] 0.0000000 0.0000000 0.0000000 0.1039470 0.4856423 1.4238073
[7] 76.6360099
Second, the fisher method which use the algorithm proposed by W. D. Fisher (1958) and discussed by Slocum et al. (2005) as the Fisher-Jenks algorithm:
> fisher <- breaks(sample_dengue, "incidence_rate", n = 6, style = "fisher", pal = rev(heat.colors(6)), distribution = TRUE)
We can print the intervals:
> attr(fisher, "breaks")
[1] 0.000000 1.011113 3.363738 7.881011 16.642031 41.763468 76.636010
We can create a consensus vector of intervals to represent our data.
> # Create a consensus:
> breaks_den <- c(0, 0.1, 1, 4, 8, 22, max(na.omit(dengue$incidence_rate)))
> # Look at the distribution of the data:
> dengue_breaks <- breaks(dengue, "incidence_rate", n = 6, style = "fixed", fixedBreaks = breaks_den, pal = rev(heat.colors(6)), distribution = TRUE)
To print the distribution of the value with the intervals printed with different colors:
> # Create a palette
> pal <- colorRampPalette(rev(heat.colors(6)))
> pal_breaks <- pal(6)[cut(na.omit(dengue_breaks$incidence_rate),
+ breaks = attr(dengue_breaks, "breaks"))]
> # Plot the distribution of the value
> plot(na.omit(dengue_breaks$incidence_rate), xlab = "time", pch = 20, col = pal_breaks)
For the visualization, we will need the package gadmVN for the map of the provinces of Vietnam and the package magrittr for the pipe.
> library(magrittr) # `%>%`
> library(gadmVN) # for 'gadm'
> map <- gadmVN::gadm(date = 1995, merge_hanoi = TRUE)
To print a choropleth map for the dengue dataset for January 2009, we can use a pipe to prepare the data and draw a choropleth map with a legend: For more detail, please see the vignette: Vizualizing the gdpm package and/or:
> ?choromap
> filter(dengue, year == 2009, month == "January") %>%
+ select(province, contains("rate")) %>%
+ choromap(map, col = rev(heat.colors(6)), fixedBreaks = breaks_den) %>%
+ legend2(legend = ., col = attr(., "colors"), col_na = "grey", n_round = 2)
To draw a choropleth map of another month or another year, we just have to change the first line of the pipe: filter(dengue, year == 2009, month == "January"). So, to make a movie of one year, we just have to create a loop printing a map for each month.
For more details, you can look at the webpage of the animation package: https://yihui.name/animation/ and:
> ?animation::saveGIF
> months <- month.name # vector of month name
>
> saveGIF({
+ monthly_loop <- for(i in seq(months)){ # loop printing a map for each months
+ filter(dengue, year == 2009, month == months[i]) %>%
+ select(province, contains("rate")) %>%
+ choromap(map, col = rev(heat.colors(6)), fixedBreaks = breaks_den) %>%
+ legend2(legend = ., col = attr(., "colors"), col_na = "grey", n_round = 1)
+ text(x = par("usr")[2], y = par("usr")[4], labels = paste0(months[i], " ", 2009), adj = c(1, 1))
+ }
+ }, movie.name = "animation_monthly.gif", interval = 0.5, ani.width = 580)
[1] TRUE
With the same principle, we can create another loop which prints a choropleth map for each month for each year:
> saveGIF({
+ yearly_loop <- for(j in seq(range(dengue$year)[1], range(dengue$year)[2])){ # loop for each year
+ monthly_loop <- for(i in seq(months)){ # monthly loop
+ filter(dengue, year == j, month == months[i]) %>%
+ select(province, contains("rate")) %>%
+ choromap(map, col = rev(heat.colors(6)), fixedBreaks = breaks_den) %>%
+ legend2(legend = ., col = attr(., "colors"), col_na = "grey", n_round = 2)
+ text(x = par("usr")[2], y = par("usr")[4], labels = paste0(months[i], " ", j), adj = c(1, 1))
+ }
+ }
+ }, movie.name = "animation_dengue.gif", interval = 0.1, ani.width = 580)
[1] TRUE